Load the required modules for the project
library(tidyverse)
library(raster)
library(sf)
library(ggspatial)
library(ggnewscale)
library(ggsn)
library(plotly)
Set the working directory
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
The data used was for this analysis is from three files. Each File contained data that needed to be joined in to a single coherent data frame that contained all the relevant information. The three files are the following:
crime_and_incarceration_by_state.csv - holds information about crime statistics for each state and other data. It is composed of 18 different variables. Only a few will be used in the analysis. They are listed below. Variables, violent_crime_total and state_population, will be used to calculate the crime rate. The calculation can be found in Section: 3.0.5. “jurisdiction” and “year” column names will be changed in Section: 3.0.4. Additional information about the processing can be found in subsequent sections.
The features that will be used from the file are:
unemployment_county.csv - This data will be processed in Section: 3.0.3. Additional information will be found in this section.
The features that will be used from the file are:
Note: All other columns will not be used and will discarded during further processing.
The objective of this project is to determine if there is a correlation between the rate of crime and unemployment. This project seeks to see if there is an observable correlation that be seen either graphically or seen in a mathematical supported way. There will be a discussion of the findings. The data topic that will covered in this project will be the temporal-spatial changes of unemployment rate in the contiguous USA.
Steps:
This code will read in data from the data files that were discussed in Section 1
`File Name` <- c('crime_and_incarceration_by_state.csv',
'Murder Rates, States By Region_Full Data_data',
'tl_2019_us_state.shp')
file_df <- data.frame(`File Name`)
knitr::kable(file_df,col.names = c("File Name"), caption = "Files Used",)
| File Name |
|---|
| crime_and_incarceration_by_state.csv |
| Murder Rates, States By Region_Full Data_data |
| tl_2019_us_state.shp |
# Read in the unemployment rate from the CSV file
Unemployrate <- read_csv("data/unemployment_county.csv")
# Read in the Crime rate from the CSV file
Crimerate <- read_csv ("data/crime_and_incarceration_by_state.csv")
# Read the states shape file
States <- st_read("data/tl_2019_us_state/tl_2019_us_state.shp")
knitr::kable(head(Unemployrate, 10), caption = "Unemployment Rate")
| County | State | Labor Force | Employed | Unemployed | Unemployment Rate | Year |
|---|---|---|---|---|---|---|
| Autauga County | AL | 24383 | 23577 | 806 | 3.3 | 2007 |
| Baldwin County | AL | 82659 | 80099 | 2560 | 3.1 | 2007 |
| Barbour County | AL | 10334 | 9684 | 650 | 6.3 | 2007 |
| Bibb County | AL | 8791 | 8432 | 359 | 4.1 | 2007 |
| Blount County | AL | 26629 | 25780 | 849 | 3.2 | 2007 |
| Bullock County | AL | 3653 | 3308 | 345 | 9.4 | 2007 |
| Butler County | AL | 9099 | 8539 | 560 | 6.2 | 2007 |
| Calhoun County | AL | 54861 | 52709 | 2152 | 3.9 | 2007 |
| Chambers County | AL | 15474 | 14469 | 1005 | 6.5 | 2007 |
| Cherokee County | AL | 11984 | 11484 | 500 | 4.2 | 2007 |
knitr::kable(head(Crimerate[ , 1:9], 10),
caption = "Crime and Incarceration by State Part 1")
| jurisdiction | includes_jails | year | prisoner_count | crime_reporting_change | crimes_estimated | state_population | violent_crime_total | murder_manslaughter |
|---|---|---|---|---|---|---|---|---|
| FEDERAL | FALSE | 2001 | 149852 | NA | NA | NA | NA | NA |
| ALABAMA | FALSE | 2001 | 24741 | FALSE | FALSE | 4468912 | 19582 | 379 |
| ALASKA | TRUE | 2001 | 4570 | FALSE | FALSE | 633630 | 3735 | 39 |
| ARIZONA | FALSE | 2001 | 27710 | FALSE | FALSE | 5306966 | 28675 | 400 |
| ARKANSAS | FALSE | 2001 | 11489 | FALSE | FALSE | 2694698 | 12190 | 148 |
| CALIFORNIA | FALSE | 2001 | 157142 | FALSE | FALSE | 34600463 | 212867 | 2206 |
| COLORADO | FALSE | 2001 | 17278 | FALSE | FALSE | 4430989 | 15492 | 158 |
| CONNECTICUT | TRUE | 2001 | 17507 | FALSE | FALSE | 3434602 | 11492 | 105 |
| DELAWARE | TRUE | 2001 | 6841 | FALSE | FALSE | 796599 | 4868 | 23 |
| FLORIDA | FALSE | 2001 | 72404 | FALSE | FALSE | 16373330 | 130713 | 874 |
knitr::kable(head(Crimerate[ , 10: 17], 10),
caption = "Crime and Incarceration by State Part 2")
| rape_legacy | rape_revised | robbery | agg_assault | property_crime_total | burglary | larceny | vehicle_theft |
|---|---|---|---|---|---|---|---|
| NA | NA | NA | NA | NA | NA | NA | NA |
| 1369 | NA | 5584 | 12250 | 173253 | 40642 | 119992 | 12619 |
| 501 | NA | 514 | 2681 | 23160 | 3847 | 16695 | 2618 |
| 1518 | NA | 8868 | 17889 | 293874 | 54821 | 186850 | 52203 |
| 892 | NA | 2181 | 8969 | 99106 | 22196 | 69590 | 7320 |
| 9960 | NA | 64614 | 136087 | 1134189 | 232273 | 697739 | 204177 |
| 1930 | NA | 3555 | 9849 | 170887 | 28533 | 121360 | 20994 |
| 639 | NA | 4183 | 6565 | 95299 | 17159 | 65762 | 12378 |
| 420 | NA | 1156 | 3269 | 27399 | 5144 | 19476 | 2779 |
| 6641 | NA | 32867 | 90331 | 782517 | 176052 | 516548 | 89917 |
knitr::kable(head(States, 10), caption = "States")
| REGION | DIVISION | STATEFP | STATENS | GEOID | STUSPS | NAME | LSAD | MTFCC | FUNCSTAT | ALAND | AWATER | INTPTLAT | INTPTLON | geometry |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | 5 | 54 | 01779805 | 54 | WV | West Virginia | 00 | G4000 | A | 62266231560 | 489271086 | +38.6472854 | -080.6183274 | MULTIPOLYGON (((-81.74725 3… |
| 3 | 5 | 12 | 00294478 | 12 | FL | Florida | 00 | G4000 | A | 138947364717 | 31362872853 | +28.4574302 | -082.4091477 | MULTIPOLYGON (((-86.38865 3… |
| 2 | 3 | 17 | 01779784 | 17 | IL | Illinois | 00 | G4000 | A | 143779863817 | 6215723896 | +40.1028754 | -089.1526108 | MULTIPOLYGON (((-91.18529 4… |
| 2 | 4 | 27 | 00662849 | 27 | MN | Minnesota | 00 | G4000 | A | 206230065476 | 18942261495 | +46.3159573 | -094.1996043 | MULTIPOLYGON (((-96.78438 4… |
| 3 | 5 | 24 | 01714934 | 24 | MD | Maryland | 00 | G4000 | A | 25151726296 | 6979340970 | +38.9466584 | -076.6744939 | MULTIPOLYGON (((-77.45881 3… |
| 1 | 1 | 44 | 01219835 | 44 | RI | Rhode Island | 00 | G4000 | A | 2677787140 | 1323663210 | +41.5974187 | -071.5272723 | MULTIPOLYGON (((-71.7897 41… |
| 4 | 8 | 16 | 01779783 | 16 | ID | Idaho | 00 | G4000 | A | 214049897859 | 2391604238 | +44.3484222 | -114.5588538 | MULTIPOLYGON (((-116.8997 4… |
| 1 | 1 | 33 | 01779794 | 33 | NH | New Hampshire | 00 | G4000 | A | 23189198255 | 1026903434 | +43.6726907 | -071.5843145 | MULTIPOLYGON (((-72.3299 43… |
| 3 | 5 | 37 | 01027616 | 37 | NC | North Carolina | 00 | G4000 | A | 125925929633 | 13463401534 | +35.5397100 | -079.1308636 | MULTIPOLYGON (((-82.41674 3… |
| 1 | 1 | 50 | 01779802 | 50 | VT | Vermont | 00 | G4000 | A | 23874197924 | 1030383955 | +44.0685773 | -072.6691839 | MULTIPOLYGON (((-73.31328 4… |
The states of Alaska, American Samoa, Northern Mariana Islands, Puerto Rico, US Virgin Islands, Hawaii, and Guam. The projects analysis will only focus on the contiguous United States or the mainland United States. Analysis will focus on the lower 48 states.
Contiguous_state <- States %>% filter(STUSPS != "AK" & STUSPS != "AS" &
STUSPS != "MP" & STUSPS != "PR" &
STUSPS != "VI" & STUSPS != "HI" &
STUSPS != "GU")
The data will be filtered to remove Alaska and Hawaii from the data set. This analysis will only focus on the contiguous United States. It is not needed so it will be removed from the data. The data will be grouped by state and then by the Year in which the data was collected. Three variables will created. These variables are the following:
Unemployrate <- Unemployrate %>% filter(State != 'AK' & State != "HI") %>%
group_by(State, Year) %>%
summarise(Totalforce = sum(`Labor Force`), Totalemployed=sum(Employed),
Totalunemployed=sum(Unemployed), Meanrate = mean(`Unemployment Rate`,
rm.na=TRUE))
The column in this data frame will need to have a column name changed from “State” to “STUSPS”. The years that will required will be also filtered from the data set. The years that are required for this project were from 2007 to 2014.
Unemployrate <- Unemployrate %>% rename("STUSPS" = "State") %>%
filter(Year %in% c(2007:2014))
In this step the crime rate will need to have two columns renamed using the rename() function. The two columns are jurisdiction and the year columns. The “jurisdiction” column will be changed to “STUSPS”. This will aid joining the frames in a later step. Changing “year” to “Year” will help keep the naming convention consistent among the data frames that are to be used in the final project.
Crimerate <- Crimerate %>%
rename("STUSPS" = "jurisdiction") %>%
rename("Year" = "year") %>%
filter(STUSPS != "FEDERAL" & STUSPS != "ALASKA" & STUSPS != "HAWAII") %>%
filter(Year %in% c(2007:2014))
There will be a need to change the state names in the STUSPS column.
Crimerate$STUSPS <- state.abb[match(str_to_title(Crimerate$STUSPS), state.name)]
The crime rate was calculated using two columns from the Crimerate data frame. The columns were:
Crimerate <- Crimerate %>%
mutate(Crimerate=(violent_crime_total/state_population) * 100) %>%
dplyr::mutate_if(is.numeric, round, 1)
The data frames will be joined so all the data will be contained in one frame. Only unique columns will be included within the final data frame. From the joined data frames select columns that are relevant for final use in the creation of the final project.
CS_Erate <- right_join(Contiguous_state, Unemployrate, by= c("STUSPS"))
CS_Erate_Crate <- right_join(CS_Erate, Crimerate, by= c("STUSPS", "Year"))
CS_Erate_Crate1 <- CS_Erate_Crate %>%
select(REGION, STUSPS, NAME, Year, Meanrate,Crimerate) %>%
rename("Unemplyrate"="Meanrate")
knitr::kable(head(CS_Erate_Crate1, 10), caption = "Combined Data")
| REGION | STUSPS | NAME | Year | Unemplyrate | Crimerate | geometry |
|---|---|---|---|---|---|---|
| 3 | WV | West Virginia | 2007 | 5.138182 | 0.3 | MULTIPOLYGON (((-81.74725 3… |
| 3 | WV | West Virginia | 2008 | 4.914546 | 0.3 | MULTIPOLYGON (((-81.74725 3… |
| 3 | WV | West Virginia | 2009 | 8.801818 | 0.3 | MULTIPOLYGON (((-81.74725 3… |
| 3 | WV | West Virginia | 2010 | 9.740000 | 0.3 | MULTIPOLYGON (((-81.74725 3… |
| 3 | WV | West Virginia | 2011 | 8.985454 | 0.3 | MULTIPOLYGON (((-81.74725 3… |
| 3 | WV | West Virginia | 2012 | 8.443636 | 0.3 | MULTIPOLYGON (((-81.74725 3… |
| 3 | WV | West Virginia | 2013 | 7.716364 | 0.3 | MULTIPOLYGON (((-81.74725 3… |
| 3 | WV | West Virginia | 2014 | 7.532727 | 0.3 | MULTIPOLYGON (((-81.74725 3… |
| 3 | FL | Florida | 2007 | 4.186567 | 0.7 | MULTIPOLYGON (((-86.38865 3… |
| 3 | FL | Florida | 2008 | 6.473134 | 0.7 | MULTIPOLYGON (((-86.38865 3… |
saveRDS(CS_Erate_Crate1, file = "CS_Erate_CrateCombined1.Rds")
# Create a copy of the data frame
stats_df <- data.frame(CS_Erate_Crate1) %>% select(-geometry)
region_unemploy <- stats_df %>%
group_by(REGION) %>%
summarise(
`Region Mean` = mean(Unemplyrate),
`Maximum Unemployment Rate` = max(Unemplyrate),
`Minimum Unemployment Rate` = min(Unemplyrate),
`Quantiles Unemployment` = list(round(quantile(Unemplyrate, type=1), 2)),
`Standard Deviation` = sd(Unemplyrate),
)
knitr::kable(region_unemploy, caption = "Regional Unemployment Statistics.", align = "cccc", digits = 2)
| REGION | Region Mean | Maximum Unemployment Rate | Minimum Unemployment Rate | Quantiles Unemployment | Standard Deviation |
|---|---|---|---|---|---|
| 1 | 7.02 | 10.54 | 3.50 | 3.50, 5.40, 7.19, 8.61, 10.54 | 1.83 |
| 2 | 6.52 | 14.12 | 2.87 | 2.87, 4.39, 5.97, 8.18, 14.12 | 2.53 |
| 3 | 8.04 | 13.27 | 3.43 | 3.43, 6.46, 7.75, 9.31, 13.27 | 2.33 |
| 4 | 7.69 | 13.81 | 2.92 | 2.92, 5.43, 7.58, 9.51, 13.81 | 2.71 |
region_unemploy_box <- stats_df %>%
group_by(REGION) %>% ggplot(mapping=aes(x=REGION, y=Unemplyrate, fill=REGION))+
geom_boxplot()+
labs(colour="Year", y="Unemployment Rate", x="Region",
title="Unemployment Rate by Region") +
theme(panel.background = element_blank(), text=element_text(size=16),
plot.title=element_text(hjust=0.5, size=20))
ggplotly(region_unemploy_box)
Figure 3.1: Unemployment Rate by Region.
region_year_unemploy <-stats_df %>%
group_by(Year, REGION) %>%
summarise(
`Region Mean` = mean(Unemplyrate),
`Maximum Unemployment Rate` = max(Unemplyrate),
`Minimum Unemployment Rate` = min(Unemplyrate),
`Quantiles: 0% 25% 50% 75% 100%` =
list(round(quantile(Unemplyrate, type=1), 2)),
`Standard Deviation` = sd(Unemplyrate),
)
knitr::kable(region_year_unemploy, caption = "Regional Unemployment Statistics by Year and Region.",
align = "cccc", digits = 2)
| Year | REGION | Region Mean | Maximum Unemployment Rate | Minimum Unemployment Rate | Quantiles: 0% 25% 50% 75% 100% | Standard Deviation |
|---|---|---|---|---|---|---|
| 2007 | 1 | 4.52 | 5.34 | 3.50 | 3.50, 4.39, 4.46, 4.72, 5.34 | 0.49 |
| 2007 | 2 | 4.79 | 8.05 | 2.87 | 2.87, 3.73, 4.71, 5.33, 8.05 | 1.39 |
| 2007 | 3 | 5.03 | 7.15 | 3.43 | 3.43, 4.12, 5.00, 5.49, 7.15 | 1.13 |
| 2007 | 4 | 4.48 | 6.78 | 2.92 | 2.92, 3.43, 4.05, 5.69, 6.78 | 1.32 |
| 2008 | 1 | 5.58 | 7.20 | 3.84 | 3.84, 5.38, 5.54, 5.77, 7.20 | 0.89 |
| 2008 | 2 | 5.43 | 8.90 | 3.21 | 3.21, 3.60, 5.41, 6.26, 8.90 | 1.71 |
| 2008 | 3 | 6.10 | 8.38 | 3.69 | 3.69, 4.79, 6.21, 6.98, 8.38 | 1.38 |
| 2008 | 4 | 5.81 | 8.64 | 3.15 | 3.15, 4.60, 5.37, 7.22, 8.64 | 1.73 |
| 2009 | 1 | 8.27 | 10.22 | 6.17 | 6.17, 7.74, 8.22, 8.97, 10.22 | 1.21 |
| 2009 | 2 | 8.29 | 14.12 | 4.26 | 4.26, 5.17, 8.03, 9.95, 14.12 | 3.12 |
| 2009 | 3 | 9.80 | 13.27 | 6.48 | 6.48, 8.00, 8.80, 11.25, 13.27 | 2.18 |
| 2009 | 4 | 9.08 | 12.93 | 6.31 | 6.31, 6.49, 8.79, 11.67, 12.93 | 2.43 |
| 2010 | 1 | 8.55 | 10.54 | 5.82 | 5.82, 8.60, 8.78, 8.93, 10.54 | 1.45 |
| 2010 | 2 | 8.19 | 13.33 | 3.96 | 3.96, 5.25, 7.69, 10.16, 13.33 | 3.05 |
| 2010 | 3 | 10.20 | 13.15 | 7.16 | 7.16, 8.50, 9.74, 11.63, 13.15 | 1.82 |
| 2010 | 4 | 9.92 | 13.81 | 6.16 | 6.16, 8.48, 9.51, 12.09, 13.81 | 2.51 |
| 2011 | 1 | 8.13 | 10.40 | 5.38 | 5.38, 7.73, 8.42, 8.61, 10.40 | 1.59 |
| 2011 | 2 | 7.35 | 11.37 | 3.76 | 3.76, 5.21, 6.86, 9.09, 11.37 | 2.48 |
| 2011 | 3 | 9.60 | 12.58 | 6.20 | 6.20, 7.75, 9.31, 11.22, 12.58 | 1.81 |
| 2011 | 4 | 9.39 | 13.43 | 5.62 | 5.62, 7.53, 8.97, 11.59, 13.43 | 2.46 |
| 2012 | 1 | 7.85 | 9.79 | 5.40 | 5.40, 7.16, 8.18, 8.64, 9.79 | 1.58 |
| 2012 | 2 | 6.54 | 10.07 | 3.53 | 3.53, 4.85, 5.96, 8.07, 10.07 | 2.12 |
| 2012 | 3 | 8.62 | 11.12 | 5.52 | 5.52, 7.37, 8.53, 9.36, 11.12 | 1.55 |
| 2012 | 4 | 8.52 | 12.14 | 5.19 | 5.19, 6.29, 7.86, 10.23, 12.14 | 2.29 |
| 2013 | 1 | 7.19 | 8.70 | 4.90 | 4.90, 7.19, 7.68, 7.70, 8.70 | 1.40 |
| 2013 | 2 | 6.30 | 9.93 | 3.48 | 3.48, 4.41, 5.48, 7.75, 9.93 | 2.16 |
| 2013 | 3 | 8.03 | 10.13 | 5.65 | 5.65, 6.87, 7.72, 9.13, 10.13 | 1.29 |
| 2013 | 4 | 7.72 | 10.73 | 4.67 | 4.67, 5.53, 7.61, 9.06, 10.73 | 2.05 |
| 2014 | 1 | 6.03 | 7.22 | 4.19 | 4.19, 6.10, 6.19, 6.47, 7.22 | 1.07 |
| 2014 | 2 | 5.27 | 8.18 | 3.12 | 3.12, 3.99, 4.76, 6.21, 8.18 | 1.59 |
| 2014 | 3 | 6.95 | 8.91 | 4.81 | 4.81, 6.02, 7.07, 7.64, 8.91 | 1.11 |
| 2014 | 4 | 6.59 | 9.45 | 4.10 | 4.10, 4.74, 7.28, 7.81, 9.45 | 1.83 |
region_year_unemploy_box <-
ggplot(stats_df) + geom_boxplot(aes(x=REGION, y=Unemplyrate, fill=REGION)) +
facet_wrap(~Year, ncol=2) +
labs(colour="Year", y="Unemployment Rate", x="Region",
title="Unemployment Rate by Year and Region") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5, size=20),
text=element_text(size=16))
ggplotly(region_year_unemploy_box)
Figure 3.2: Unemployment Rate by Year and Region.
region_crime <- stats_df %>%
group_by(REGION) %>%
summarise(
`Region Mean` = mean(Crimerate),
`Maximum Crime Rate` = max(Crimerate),
`Minimum Crime Rate` = min(Crimerate),
`Quantiles: 0% 25% 50% 75% 100%` = list(quantile(Crimerate, type=1)),
`Standard Deviation` = sd(Crimerate)
)
knitr::kable(region_crime, caption = "Regional Crime Statistics.",
align = "cccc", digits = 2)
| REGION | Region Mean | Maximum Crime Rate | Minimum Crime Rate | Quantiles: 0% 25% 50% 75% 100% | Standard Deviation |
|---|---|---|---|---|---|
| 1 | 0.27 | 0.5 | 0.1 | 0.1, 0.2, 0.3, 0.4, 0.5 | 0.12 |
| 2 | 0.34 | 0.6 | 0.2 | 0.2, 0.3, 0.3, 0.4, 0.6 | 0.10 |
| 3 | 0.45 | 0.8 | 0.2 | 0.2, 0.3, 0.5, 0.5, 0.8 | 0.15 |
| 4 | 0.36 | 0.8 | 0.2 | 0.2, 0.2, 0.3, 0.4, 0.8 | 0.16 |
region_crime_box <- stats_df %>%
group_by(REGION) %>% ggplot(mapping=aes(x=REGION, y=Crimerate, fill=REGION))+
geom_boxplot() +
labs(colour="Year", y="Crime Rate", x="Region",
title="Crime Rate by Region") +
theme(panel.background = element_blank(),
plot.title=element_text(hjust=0.5, size=20), text=element_text(size=16))
ggplotly(region_crime_box)
Figure 3.3: Crime Rate by Region.
region_year_crime <-stats_df %>%
group_by(Year, REGION) %>%
summarise(
`Region Mean` = mean(Crimerate),
`Maximum Crime Rate` = max(Crimerate),
`Minimum Crime Rate` = min(Crimerate),
`Quantiles: 0% 25% 50% 75% 100%` = list(quantile(Crimerate, type=1)),
`Standard Deviation` = sd(Crimerate),
)
knitr::kable(region_year_crime, caption = "Regional Crime Statistics by Year and Region.",
align = "cccc", digits = 2)
| Year | REGION | Region Mean | Maximum Crime Rate | Minimum Crime Rate | Quantiles: 0% 25% 50% 75% 100% | Standard Deviation |
|---|---|---|---|---|---|---|
| 2007 | 1 | 0.26 | 0.4 | 0.1 | 0.1, 0.1, 0.3, 0.4, 0.4 | 0.13 |
| 2007 | 2 | 0.37 | 0.6 | 0.2 | 0.2, 0.3, 0.3, 0.5, 0.6 | 0.13 |
| 2007 | 3 | 0.52 | 0.8 | 0.3 | 0.3, 0.3, 0.5, 0.7, 0.8 | 0.18 |
| 2007 | 4 | 0.43 | 0.8 | 0.2 | 0.2, 0.3, 0.4, 0.5, 0.8 | 0.18 |
| 2008 | 1 | 0.29 | 0.5 | 0.1 | 0.1, 0.2, 0.3, 0.4, 0.5 | 0.14 |
| 2008 | 2 | 0.36 | 0.5 | 0.2 | 0.2, 0.3, 0.3, 0.4, 0.5 | 0.10 |
| 2008 | 3 | 0.52 | 0.7 | 0.3 | 0.3, 0.3, 0.5, 0.7, 0.7 | 0.16 |
| 2008 | 4 | 0.39 | 0.7 | 0.2 | 0.2, 0.2, 0.3, 0.5, 0.7 | 0.19 |
| 2009 | 1 | 0.29 | 0.5 | 0.1 | 0.1, 0.2, 0.3, 0.4, 0.5 | 0.14 |
| 2009 | 2 | 0.34 | 0.5 | 0.2 | 0.2, 0.3, 0.3, 0.4, 0.5 | 0.11 |
| 2009 | 3 | 0.48 | 0.7 | 0.2 | 0.2, 0.3, 0.5, 0.6, 0.7 | 0.15 |
| 2009 | 4 | 0.36 | 0.7 | 0.2 | 0.2, 0.2, 0.3, 0.5, 0.7 | 0.17 |
| 2010 | 1 | 0.29 | 0.5 | 0.1 | 0.1, 0.2, 0.3, 0.4, 0.5 | 0.14 |
| 2010 | 2 | 0.32 | 0.5 | 0.2 | 0.2, 0.2, 0.3, 0.4, 0.5 | 0.11 |
| 2010 | 3 | 0.44 | 0.6 | 0.2 | 0.2, 0.3, 0.4, 0.5, 0.6 | 0.14 |
| 2010 | 4 | 0.35 | 0.7 | 0.2 | 0.2, 0.2, 0.3, 0.4, 0.7 | 0.16 |
| 2011 | 1 | 0.27 | 0.4 | 0.1 | 0.1, 0.2, 0.3, 0.4, 0.4 | 0.12 |
| 2011 | 2 | 0.31 | 0.4 | 0.2 | 0.2, 0.2, 0.3, 0.4, 0.4 | 0.08 |
| 2011 | 3 | 0.43 | 0.6 | 0.2 | 0.2, 0.3, 0.4, 0.5, 0.6 | 0.14 |
| 2011 | 4 | 0.34 | 0.6 | 0.2 | 0.2, 0.2, 0.3, 0.4, 0.6 | 0.15 |
| 2012 | 1 | 0.28 | 0.4 | 0.1 | 0.1, 0.2, 0.3, 0.4, 0.4 | 0.12 |
| 2012 | 2 | 0.33 | 0.5 | 0.2 | 0.2, 0.3, 0.3, 0.4, 0.5 | 0.10 |
| 2012 | 3 | 0.44 | 0.6 | 0.2 | 0.2, 0.3, 0.5, 0.5, 0.6 | 0.13 |
| 2012 | 4 | 0.34 | 0.6 | 0.2 | 0.2, 0.2, 0.3, 0.4, 0.6 | 0.15 |
| 2013 | 1 | 0.27 | 0.4 | 0.1 | 0.1, 0.2, 0.3, 0.3, 0.4 | 0.11 |
| 2013 | 2 | 0.33 | 0.5 | 0.2 | 0.2, 0.3, 0.3, 0.4, 0.5 | 0.08 |
| 2013 | 3 | 0.41 | 0.6 | 0.2 | 0.2, 0.3, 0.4, 0.5, 0.6 | 0.12 |
| 2013 | 4 | 0.34 | 0.6 | 0.2 | 0.2, 0.2, 0.3, 0.4, 0.6 | 0.15 |
| 2014 | 1 | 0.24 | 0.4 | 0.1 | 0.1, 0.2, 0.2, 0.3, 0.4 | 0.11 |
| 2014 | 2 | 0.32 | 0.4 | 0.2 | 0.2, 0.3, 0.3, 0.4, 0.4 | 0.06 |
| 2014 | 3 | 0.40 | 0.6 | 0.2 | 0.2, 0.3, 0.4, 0.5, 0.6 | 0.12 |
| 2014 | 4 | 0.34 | 0.6 | 0.2 | 0.2, 0.2, 0.3, 0.4, 0.6 | 0.15 |
region_year_crime_box <-
ggplot(stats_df) + geom_boxplot(aes(x=REGION, y=Crimerate, fill=REGION)) +
facet_wrap(~Year, ncol=2) +
labs(colour="Year", y="Crimet Rate", x="Region",
title="Crime Rate by Year and Region") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5, size=20),
text=element_text(size=16))
ggplotly(region_year_crime_box)
Figure 3.4: Crime Rate by Year and Region.
The data visualizations that were produced for the project were the following:
Data for the creation of the graphs is loaded from the RDS file that was created in a previous section of the project. The file is a “.Rds” the name of the file is:
This file will read in using the readRDS(). The data found in this will then be used to create the plots that are found in this section of the project.
Read the cleaned data from the “.Rds” file.
all_info_from_RDS <- readRDS("CS_Erate_CrateCombined1.Rds")
This is a map of the unemployment rate for the year 2014. This will be an interactive plot using the plot_ly function to create it.
The only year that will plotted on this time series plot will be for the year 2014. This data will be filtered from the all_info_from_RDS.
Note: This step could have been done using a pipe, but this makes it easier to see what is going on.
info_for_year_2014 <- all_info_from_RDS %>% filter(all_info_from_RDS$Year == 2014)
Using the info_for_year_2014 data frame a graph of the contiguous United States will be created showing unemployment rate as a layer on the graph.
sp1 <- ggplot(data=info_for_year_2014) +
geom_sf(data= info_for_year_2014$geometry,
aes(fill=info_for_year_2014$Unemplyrate,
text=paste("State: ",info_for_year_2014$NAME ,
"\nUnemployment Rate: ",
round(info_for_year_2014$Unemplyrate, 2 )))) +
xlab("Longitude") +
ylab("Latitude") +
guides(fill=guide_legend(title= "Unemployment Rate for 2014")) +
labs(title = "Unemployment Rate Over Contiguous USA ",
subtitle = "Unemployment Color Coded by State",
caption = "Data source: Unknown") +
scalebar(data= info_for_year_2014, location="bottomleft", dist= 500, st.size=2,
dist_unit = "km", transform= TRUE, model= "WGS84", st.dist=0.04) +
annotation_north_arrow(location = "br", which_north = "true",
style = north_arrow_fancy_orienteering) +
theme(panel.background = element_blank(), legend.position = "right",
plot.title = element_text(hjust = 0.5, size=20),
plot.subtitle = element_text(hjust = 0.5, size=16),
text=element_text(size=16))
sp1
Figure 3.5: A spatial map over the contiguous USA for the unemployment rate for the year 2014
Using the info_for_year_2014 data frame a graph of the contiguous United States will be created showing crime rate as a layer on the graph.
ggplot(data=info_for_year_2014) +
geom_sf(data= info_for_year_2014$geometry,
aes(fill=info_for_year_2014$Crimerate)) +
xlab("Longitude") +
ylab("Latitude") +
guides(fill=guide_legend(title= "Crime Rate for 2014")) +
labs(title = "Crime Rate Over Contiguous USA ",
subtitle = "Crime Rate Color Coded by State",
caption = "Data source: Unknown") +
scalebar(data= info_for_year_2014, location="bottomleft", dist= 500, st.size=2,
dist_unit = "km", transform= TRUE, model= "WGS84", st.dist=0.04) +
annotation_north_arrow(location = "br", which_north = "true",
style = north_arrow_fancy_orienteering) +
theme(panel.background = element_blank(),
plot.title = element_text(hjust = 0.5, size=20),
plot.subtitle = element_text(hjust = 0.5, size=16),
text=element_text(size=16))
Figure 3.6: Spatial map over the contiguous USA for the crime rate for the year 2014
Creates a scatter plot using crime rate (x-axis) and unemployment rate (y-axis).
fig <- plot_ly(data= info_for_year_2014, x= ~Crimerate, y= ~Unemplyrate,
color= ~REGION) %>%
add_markers() %>%
layout(title="<b>Unemployment Rate and Crime Rate for 2014 </b>",
margin=list(b = 10, l= 10)) %>%
layout(xaxis=list(title= "<b>Crime Rate Per 100,000 People</b>"),
yaxis=list(title="<b>Unemployment Rate Per 100 People </b>"),
legend=list(title=list(text='<b> Region </b>'),
showlegend=TRUE)) %>%
layout(xaxis=list(titlefont= list(size= 14)),
yaxis=list(titlefont= list(size= 14)))
fig
Figure 3.7: Scatter plot for the data relationship between the unemployment rate and crime rate
This will be an interactive plot of the unemployment rate for four states:
Steps to create the time series plot:
Section 3.3 data filtered from the all_info_from_RDS data frame and a new data frame will be created. A vector of states was created to form the list of states that were to plotted on the graph. These states will be used for this time series plot and the one that follows.
states <- c("California", "Idaho", "Illinois", "Indiana")
four_states_year_2014 <- all_info_from_RDS %>% filter(NAME %in% states)
stats_df <- as.data.frame(four_states_year_2014)
une <- plot_ly(data=stats_df, x= ~as.factor(Year), y= ~Unemplyrate,color= ~NAME) %>%
filter(NAME %in% states) %>%
group_by(NAME) %>%
add_lines() %>%
layout(title="<b>Unemployment Rate Changes by Year</b>",
xaxis=list(title= "<b>Year</b>"),
yaxis=list(title="<b>Unemployment Rate</b>"),
legend=list(title=list(text='<b> State </b>'), showlegend=TRUE))
une
Figure 3.8: Unemployment rate time series plot.
Note: To better see the crime rate for California select it from the legend on the right of the plot.
cr <- plot_ly(data=stats_df, x= ~as.factor(Year), y= ~Crimerate, color= ~NAME) %>%
filter(NAME %in% states) %>%
group_by(NAME) %>%
add_lines() %>%
layout(title="<b>Crime Rate Changes by Year</b>",
xaxis=list(title= "<b>Year</b>"),
yaxis=list(title="<b>Crime Rate</b>"), yaxis=list(range(c(0, .7))),
legend=list(title=list(text='<b> State </b>'), showlegend=TRUE))
cr
Figure 3.9: Crime rate time series plot.
(See “Spatial unemployment” Figure 3.5)
(See “Spatial crime” Figure 3.6)
Reviewing the scatter plot we see a plot the does not show a linear correlation between the crime rate and the unemployment rate. There is no dependence on the variables. If we change the one variable we cannot assume that a change in another variable will be seen. There is no dependence among the variable of (See “Scatter” Figure 3.7) (Question Video: Identifying the Linear Correlation from the Scattergraph, Nagwa, n.d.).
If we look at the correlation coefficient 0.17. The value of 0.17 indicates that there is almost no correlation between the variables to indicate that there is a meaningful correlation between crime and the unemployment rates. This value is very close to the value zero. In order for there to be a correlation between the variables the value needs to fall closer to either -1 or to 1. With values closer to -1 indicating a negative correlation indicating that the variables change in a negative relation to each other. If one of the variables increase the other will decrease and vice versa (Soetewey, 2020).
(See “Crime rate” Figure 3.9)
(See “Unemployment Rate” Figure 3.8)
Question Video: Identifying the Linear Correlation from the Scattergraph, Nagwa. (n.d.). Identifying the Linear Correlation from the Scattergraph [Video]. Nagwa. https://www.nagwa.com/en/videos/909167139353/
Soetewey, A. (2020, May 28). Correlation coefficient and correlation test in R. Stats and R. Retrieved February 13, 2025, from https://statsandr.com/blog/correlation-coefficient-and-correlation-test-in-r/